Goto

Collaborating Authors

 multi-armed bandit problem



Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

Yining Wang, Xi Chen, Yuan Zhou

Neural Information Processing Systems

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of Op? T log log T q, which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.




CategorizedBandits

Neural Information Processing Systems

In the multi-armed bandit problem, an agent has several possible decisions, usually referred to as "arms", and chooses or "pulls" sequentially one of them at each time step. This generates a sequence of rewards and the objective is to maximize their cumulative sum.





FairAlgorithmsforMulti-AgentMulti-ArmedBandits

Neural Information Processing Systems

Instead, we seek to learn a fair distribution overthearms. Drawing onalong lineofresearch ineconomics and computer science, we use theNash social welfareas our notion of fairness.